Respondent driven sampling—where we are and where should we be going?
نویسندگان
چکیده
Respondent Driven Sampling (RDS) is a novel variant of link tracing sampling that has primarily been used to estimate the characteristics of hard-to-reach groups, such as the HIV prevalence of drug users. ‘Seeds’ are selected by convenience from a population of interest (target population) and given coupons. Seeds then use these coupons to recruit other people, who themselves become recruiters. Recruits are given compensation, usually money, for taking part in the survey and also an incentive for recruiting others. This process continues in recruitment ‘waves’ until the survey is stopped. Estimation methods are then applied to account for the biased recruitment, for example, the presumed over-recruitment of people with more acquaintances, in an attempt to generate estimates for the underlying population. RDS has quickly become popular and relied on by major public health organisations, including the US Centers for Disease Control and Prevention and Family Health International, chiefly because it is often found to be an efficient method of recruitment in hard-to-reach groups, but also because of the availability of custom written software incorporating inference methods that are designed to generate estimates that are representative of the wider population of interest, despite the biased sampling. As demonstrated by RDS’s popularity, there was a clear need for new methods of data collection on hard-to-reach groups. However, RDS has not been without its critics. Its reliance on the target population for recruitment introduced ethical and sampling concerns. If RDS estimates are overly biased or the variance is unacceptably high, then RDS will be little more than another method of convenience sampling. If these errors can be minimised however, then RDS has the potential to become a very useful survey methodology. In this editorial we highlight that ‘RDS’ includes both data collection and statistical inference methods, discuss the limitations of current RDS inference methods for generating representative estimates, highlight other applications of RDS for which it may be more reliable, propose and request feedback on a draft RDS reporting checklist, and finally suggest priority areas for RDS research. As commonly discussed, RDS is actually a collection of methods to carry out two primary tasks, a method to sample a population and a method of statistical inference to generate population estimates. A custom-written computer package, ‘RDSAT’, has been released to assist with data handling, tabulation and inference. The RDS method of sampling is often efficient, with samples usually accruing quickly and with minimal perceived need for intervention by project staff, and has led to the collection of a wealth of data. Lessons learned from the design and implementation of RDS have been shared and coalesced into standard protocols for data collection. However, the performance of the inference methods is far less certain. There is much disagreement and confusion about the suitability and utility of the current methods of statistical inference and therefore the ability of RDS to generate representative data. Current inference methods rely on multiple assumptions of the sampling process, most of which may not be met in practice. 5 w4 Hence, in retrospect, it might seem reasonable to expect that RDS estimates are likely to suffer from (perhaps large) error. Unfortunately, we really do not know if this is true or not, because there have been few robust evaluations. This is in part because such evaluations are methodologically challenging to carry out. The representative or total-population data that are required are generally unavailable for hard-to-reach groups (hence the need for RDS). The most convincing studies that do exist (see; Goel et al also has a useful summary of other evaluations) suggest that RDS samples (i) may indeed suffer from bias and the bias may be difficult to detect, (ii) that the current inference methods do not reduce these biases, and perhaps most importantly, (iii) estimates probably have higher variance than initially thought. The latter is important because it means sample sizes in the thousands may be required to get the levels of precision currently assumed obtainable from sample sizes in the hundreds, and would therefore make RDS studies substantially larger, longer and more expensive than current common practice. The practical implications of these findings are that when interpreting RDS surveys that make statements about the wider population beyond the sample, CIs should be assumed to be too narrow, and adjustments should not be assumed to have made the unadjusted estimates more representative. Readers should also consider the unadjusted estimates and how representative they might be of the wider population. That said, generating representative estimates is one of the most difficult things we could ask of RDS. Other potential applications for the RDS method or data collected using RDS include risk factor identification, social network data collection, population size estimation, and implementation of interventions. These other applications require fewer (or no) sampling assumptions be met. RDS might not be a panacea, but could still be the best method to collect data on many hard-to-reach groups. However, it is critical that the concerns about statistical inference be addressed, and the current benefits and limitations of RDS be better communicated to the broader public health community. Another concern is that currently RDS studies are not being adequately reported. Ultimately his reduces Department of Infectious Disease Epidemiology, Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, UK; Division of HIV/AIDS Prevention, National Center for HIV/AIDS, Viral Hepatitis, STD and TB Prevention, Centers for Disease Control and Prevention, Atlanta, Georgia, USA; Microsoft Research, New York, USA; World Bank, USA; Division of Global HIV/AIDS, Center for Global Health, Centers for Disease Control and Prevention, Atlanta, Georgia, USA; Department of Veterinary Medicine, University of Cambridge, Cambridge, UK
منابع مشابه
Reflection for the undergraduate on writing in the portfolio: where are we now and where are we going?
Introduction: The portfolio can be seen as a tool for assessmentof a variety of learning activities that differ in content, usage, andassessment. The portfolio not only meets the learner’s educationalneeds but also the political and public reassurance demand thatthe health professional has achieved the required competency ofthe curriculum that allows him or her to practice safely with orwithout...
متن کاملI-36: Preimplantation Genetic Diagnosis - Where Have We Been and Where Are We Going
Preimplantation genetic diagnosis (PGD) is now considered routine in IVF laboratories with micromanipulation capability and access to genetic diagnostic services. The past two decades have witnessed a dramatic increase in the use of PGD, the number of cycles performed, and the indications for which PGD has been used. This increase has been mirrored by a slow, but steady, increase in the range o...
متن کاملمدیریت و اجرای نمونهگیری از معتادان تزریقی در معرض بیماریهای پرخطر
Objectives: Study of hidden populations such as Injection Drug Users (IDU) due to their exposing to high risk diseases and their roll in public health is very crucial. Conventional statistical methods for sampling these populations are not applicable because of the restrictions these populations are faced. Despite the introduction of respondent driven sampling as a successful method for samplin...
متن کاملGeneralized least squares can overcome the critical threshold in respondent-driven sampling
In order to sample marginalized and/or hard-to-reach populations, respondent-driven sampling (RDS) and similar techniques reach their participants via peer referral. Under a Markov model for RDS, previous research has shown that if the typical participant refers too many contacts, then the variance of common estimators does not decay like O(n−1), where n is the sample size. This implies that co...
متن کاملRisk measurement and Implied volatility under Minimal Entropy Martingale Measure for Levy process
This paper focuses on two main issues that are based on two important concepts: exponential Levy process and minimal entropy martingale measure. First, we intend to obtain risk measurement such as value-at-risk (VaR) and conditional value-at-risk (CvaR) using Monte-Carlo methodunder minimal entropy martingale measure (MEMM) for exponential Levy process. This Martingale measure is used for the...
متن کامل